Eliciting Additive Reward Functions for Markov Decision Processes
نویسندگان
چکیده
Specifying the reward function of a Markov decision process (MDP) can be demanding, requiring human assessment of the precise quality of, and tradeoffs among, various states and actions. However, reward functions often possess considerable structure which can be leveraged to streamline their specification. We develop new, decisiontheoretically sound heuristics for eliciting rewards for factored MDPs whose reward functions exhibit additive independence. Since we can often find good policies without complete reward specification, we also develop new (exact and approximate) algorithms for robust optimization of imprecise-reward MDPs with such additive reward. Our methods are evaluated in two domains: autonomic computing and assistive technology.
منابع مشابه
Exact Distributions for Reward Functions on Semi-markov and Markov Additive Processes
The distribution theory for reward functions on semi-Markov processes has been of interest since the early 1960s. The relevant asymptotic distribution theory has been satisfactorily developed. On the other hand, it has been noticed that it is difficult to find exact distribution results which lead to the effective computation of such distributions. Note that there is no satisfactory exact distr...
متن کاملCOVARIANCE MATRIX OF MULTIVARIATE REWARD PROCESSES WITH NONLINEAR REWARD FUNCTIONS
Multivariate reward processes with reward functions of constant rates, defined on a semi-Markov process, first were studied by Masuda and Sumita, 1991. Reward processes with nonlinear reward functions were introduced in Soltani, 1996. In this work we study a multivariate process , , where are reward processes with nonlinear reward functions respectively. The Laplace transform of the covar...
متن کاملMarkov Decision Processes with Functional Rewards
Markov decision processes (MDP) have become one of the standard models for decision-theoretic planning problems under uncertainty. In its standard form, rewards are assumed to be numerical additive scalars. In this paper, we propose a generalization of this model allowing rewards to be functional. The value of a history is recursively computed by composing the reward functions. We show that sev...
متن کاملRegret-based Reward Elicitation for Markov Decision Processes
The specification of a Markov decision process (MDP) can be difficult. Reward function specification is especially problematic; in practice, it is often cognitively complex and time-consuming for users to precisely specify rewards. This work casts the problem of specifying rewards as one of preference elicitation and aims to minimize the degree of precision with which a reward function must be ...
متن کامل